Search and Learning for the Linear Ordering Problem with an Application to Machine Translation

نویسندگان

  • Roy Wesley Tromble
  • Jason Eisner
  • Sanjeev Khudanpur
  • Chris Callison-Burch
چکیده

This dissertation is about ordering. The problem of arranging a set of n items in a desired order is quite common, as well as fundamental to computer science. Sorting is one instance, as is the Traveling Salesman Problem. Each problem instance can be thought of as optimization of a function that applies to the set of permutations. The dissertation treats word reordering for machine translation as another instance of a combinatorial optimization problem. The approach introduced is to combine three different functions of permutations. The first function is based on finite-state automata, the second is an instance of the Linear Ordering Problem, and the third is an entirely new permutation problem related to the LOP. The Linear Ordering Problem has the most attractive computational properties of the three, all of which are NP-hard optimization problems. The dissertation expends significant effort developing neighborhoods for local search on the LOP, and uses grammars and other tools from natural language parsing to introduce several new results, including a state-of-the-art local search procedure. Combinatorial optimization problems such as the TSP or the LOP are usually given the function over permutations. In the machine translation setting, the weights are not given, only words. The dissertation applies machine learning techniques to derive a LOP from each given sentence using a corpus of sentences and their translations for training. It proposes a set of features for such learning and argues their propriety for translation based on an analogy to dependency parsing. It adapts a number of parameter optimization procedures to the novel setting of the LOP. The signature result of the thesis is the application of a machine learned set of linear ordering problems to machine translation. Using reorderings found by search as a preprocessing step significantly improves translation of German to English, and significantly more than the lexicalized reordering model that is the default of the translation system. In addition, the dissertation provides a number of new theoretical results, and lays out an ambitious program for potential future research. Both the reordering model and the optimization techniques have broad applicability, and the availability of machine learning makes even new problems without obvious structure approachable.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

APPLICATION OF THE HYBRID HARMONY SEARCH WITH SUPPORT VECTOR MACHINE FOR IDENTIFICATION AND CALSSIFICATION OF DAMAGED ZONE AROUND UNDERGROUND SPACES

An excavation damage zone (EDZ) can be defined as a rock zone where the rock properties and conditions have been changed due to the processes related to an excavation. This zone affects the behavior of rock mass surrounding the construction that reduces the stability and safety factor and increase probability of failure of the structure. This paper presents an approach to build a model for the ...

متن کامل

Application of the Extreme Learning Machine for Modeling the Bead Geometry in Gas Metal Arc Welding Process

Rapid prototyping (RP) methods are used for production easily and quickly of a scale model of a physical part or assembly. Gas metal arc welding (GMAW) is a widespread process used for rapid prototyping of metallic parts. In this process, in order to obtain a desired welding geometry, it is very important to predict the weld bead geometry based on the input process parameters, which are voltage...

متن کامل

A Hybrid Machine Learning Method for Intrusion Detection

Data security is an important area of concern for every computer system owner. An intrusion detection system is a device or software application that monitors a network or systems for malicious activity or policy violations. Already various techniques of artificial intelligence have been used for intrusion detection. The main challenge in this area is the running speed of the available implemen...

متن کامل

An improved memetic algorithm to minimize earliness–tardiness on a single batch processing machine

In this research, a single batch processing machine scheduling problem with minimization of total earliness and tardiness as the objective function is investigated.We first formulate the problem as a mixed integer linear programming model. Since the research problem is shown to be NP-hard, an improved memetic algorithmis proposed to efficiently solve the problem. To further enhance the memetic ...

متن کامل

Lot Streaming in No-wait Multi Product Flowshop Considering Sequence Dependent Setup Times and Position Based Learning Factors

This paper considers a no-wait multi product flowshop scheduling problem with sequence dependent setup times. Lot streaming divide the lots of products into portions called sublots in order to reduce the lead times and work-in-process, and increase the machine utilization rates. The objective is to minimize the makespan. To clarify the system, mathematical model of the problem is presented. Sin...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009